A visual language-based system for extraction-transformation-loading development

نویسندگان

  • Vincenzo Deufemia
  • Massimiliano Giordano
  • Giuseppe Polese
  • Genny Tortora
چکیده

Data warehouse loading and refreshment is typically performed by means of complex software processes called extraction–transformation–loading (ETL). In this paper, we propose a system based on a suite of visual languages for mastering several aspects of the ETL development process, turning it into a visual programming task. The approach can be easily generalized and applied to other data integration contexts beyond data warehouses. It introduces two new visual languages that are used to specify the ETL process, which can also be represented by means of UML activity diagrams. In particular, the first visual language supports data manipulation activities, whereas the second one provides traceability information of attributes to highlight the impact of potential transformations on integrated schemas depending on them. Once the whole ETL process has been visually specified, the designer might invoke the automatic generation of an activity diagram representing a possible orchestration of it based on its dependencies. The designer can edit such a diagram to modify the proposed orchestration provided that changes do not alter data dependencies. The final specification can be translated into code that is executable on the data sources. Finally, the effectiveness of the proposed approach has been validated through a user study in which we have compared the effort needed to design an ETL process in our approach with respect to the one required with main visual approaches described in the literature. Copyright © 2013 John Wiley & Sons, Ltd.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing Reliable yet Flexible Software through If-Then Model Transformation Rules

Developing reliable yet flexible software is a hard problem. Although modeling methods enjoy a lot of advantages, the exclusive use of just one of them, in many cases, may not guarantee the development of reliable and flexible software. Formal modeling methods ensure reliability because they use a rigorous approach to software development. However, lack of knowledge and high cost practically fo...

متن کامل

Reduced-Reference Image Quality Assessment based on saliency region extraction

In this paper, a novel saliency theory based RR-IQA metric is introduced. As the human visual system is sensitive to the salient region, evaluating the image quality based on the salient region could increase the accuracy of the algorithm. In order to extract the salient regions, we use blob decomposition (BD) tool as a texture component descriptor. A new method for blob decomposition is propos...

متن کامل

Efficient communication in financial data warehousing projects - Insights from a multiple case study

Data warehouses play important roles in the IT landscape of the financial industry. Banks have to deal with complex communication issues in financial data warehouse projects. Especially the creation of extraction, transformation and loading (ETL) processes depends on the project team’s communication ability and given communication barriers. We briefly present a theoretical efficiency model base...

متن کامل

Development and Practical Application of a Bridge Management System (J-BMS) in Japan

This paper presents a new bridge management system (J-BMS). It is integrated with a concrete bridge rating expert system that can be used to evaluate the serviceability of existing concrete bridges. The proposed J-BMS not only evaluates the performance of bridges, but also offers a rehabilitation strategy based on a combination of maintenance cost minimization and quality maximization. In this ...

متن کامل

Applying UML for Modeling the Physical Design of Data Warehouses

In previous work, we have shown how to use unified modeling language (UML) as the primary representation mechanism to model conceptual design, logical design, modeling of extraction, transformation, loading (ETL) processes, and defining online analytical processing (OLAP) requirements of data warehouses (DW). Continuing our work on using UML throughout the DW development lifecycle, in this chap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Softw., Pract. Exper.

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2014